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Abstract 

Current cone jet algorithms, widely used at hadron colliders, take event particles 
as seeds in an iterative search for stable cones. A longstanding infrared (IR) unsafety 
issue in such algorithms is often assumed to be solvable by adding extra 'midpoint' 
seeds, but actually is just postponed to one order higher in the coupling. A proper 
solution is to switch to an exact seedless cone algorithm, one that provably identifies 
all stable cones. The only existing approach takes N2 N time to find jets among 
N particles, making it unusable at hadron level. This can be reduced to A 2 In A 
time, leading to code (SISCone) whose speed is similar to that of public midpoint 
implementations. Monte Carlo tests provide a strong cross-check of an analytical 
proof of the IR safety of the new algorithm, and the absence of any '/£ se p' issue 
implies a good practical correspondence between parton and hadron levels. Relative 
to a midpoint cone, the use of an IR safe seedless algorithm leads to modest changes 
for inclusive jet spectra, mostly through reduced sensitivity to the underlying event, 
and significant changes for some multi-jet observables. 

SISCone, the C++ implementation of the algorithm, is available at 
http : //projects .hepforge . org/si scone/ (standalone), 
http : //www. lpthe . jussieu.fr/~salam/fastjet7j (Fast Jet plugin). 
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1 Introduction 

Two broad classes of jet definition are generally advocated [1] for hadron colliders. One 
option is to use sequential recombination jet algorithms, such as the k t [2] and Cam- 
bridge/Aachen algorithms [3], which introduce a distance measure between particles, and 
repeatedly recombine the closest pair of particles until some stopping criterion is reached. 
While experimentally these are starting to be investigated [HE], the bulk of measurements 
are currently carried out with the other class of jet definition, cone jet algorithms (see e.g. 
[6]). In general there are indications [7] that it may be advantageous to use both sequential 
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recombination and cone jet algorithms because of complementary sensitivities to different 
classes of non-perturbative corrections. 

Cone jet algorithms are inspired by the idea [8] of defining a jet as an angular cone 
around some direction of dominant energy flow. To find these directions of dominant 
energy flow, cone algorithms usually take some (or all) of the event particles as 'seeds', 
i.e. trial cone directions. Then for each seed they establish the list of particles in the trial 
cone, evaluate the sum of their 4-momenta, and use the resulting 4-momentum as a new 
trial direction for the cone. This procedure is iterated until the cone direction no longer 
changes, i.e. until one has a "stable cone". 

Stable cones have the property that the cone axis a (a four-vector) coincides with the 
(four-vector) axis defined by the total momentum of the particles contained in the cone, 

D (p mconc ,a) = , with p mconc = ^piQ^R - D(p { , a)) , (1) 

i 

where D(p, a) is some measure of angular distance between the four-momentum p and the 
cone axis a, and R is the given opening (half )-angle of the cone, also referred to as the cone 
radius. Typically one defines D 2 (p, a) = (y p — y a ) 2 + (0 P — (fi a ) 2 , where y p , y a and P , <p a are 
respectively the rapidity and azimuth of p and a. 

Two types of problem arise when using seeds as starting points of an iterative search 
for stable cones. On one hand, if one only uses particles above some momentum threshold 
as seeds, then the procedure is collinear unsafe. Alternatively if any particle can act as a 
seed then one needs to be sure that the addition of an infinitely soft particle cannot lead 
to a new (hard) stable cone being found, otherwise the procedure is infrared (IR) unsafe. 

The second of these problems came to fore in the 1990's [9], when it was realised that 
there can be stable cones that have two hard particles on opposing edges of the cone and 
no particles in the middle, e.g. for configurations such as 

Pa > Vt%\ R < D( Pl ,p 2 ) < (l+p t2 /p tl )R. (2) 

In traditional iterative cone algorithms, pi and p 2 each act as seeds and two stable cones 
are found, one centred on pi, the other centred on p 2 . The third stable cone, centred 
between pi and p 2 (and containing them both) is not found. If, however, a soft particle 
is added between the two hard particles, it too acts as a seed and the third stable cone is 
then found. The set of stable cones (and final jets) is thus different with and without the 
soft particle and there is a resulting non-cancellation of divergent real soft production and 
corresponding virtual contributions, i.e. the algorithm is infrared unsafe. 

Infrared unsafety is a serious issue, not just because it makes it impossible to carry 
out meaningful (finite) perturbative calculations, but also because it breaks the whole 
relation between the (Born or low-order) partonic structure of the event and the jets that 
one observes, and it is precisely this relation that a jet algorithm is supposed to codify: 
it makes no sense for the structure of multi-hundred GeV jets to change radically just 
because hadronisation, the underlying event or pileup threw a 1 GeV particle in between 
them. 
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A workaround for the above IR unsafety problem was proposed in [9j: after finding 
the stable cones that come from the true seed particles, add artificial "midpoint" seeds 
between pairs of stable cones and search for new stable cones that arise from the midpoint 
seeds. For configurations with two hard particles, the midpoint fix resolved the IR unsafety 
issue. It was thus adopted as a recommendation [6] for Run II of the Tevatron and is now 
coming into use experimentally [TQl [11] . 

Recently, it was observed p] that in certain triangular three-point configurations there 
are stable cones that are not identified even by the midpoint procedure. While these can be 
identified by extended midpoint procedures (e.g. midpoints between triplets of particles) 
P2J HB], in this article (section[3]) we show that there exist yet other 3-particle configurations 
for which even this fix does not find all stable cones. 

Given this history of infrared safety problems being fixed and new ones being found, 
it seems to us that iterative^ cone algorithms should be abandoned. Instead we believe 
that cone jet algorithms should solve the mathematical problem of demonstrably finding 
all stable cones, i.e. all solutions to eq. ([T]). This kind of jet algorithm is referred to as 
an exact seedless cone jet algorithm [6J and has been advocated before in [IB] . With an 
exact seedless algorithm, the addition of one or more soft particles cannot lead to new 
hard stable cones being found, because all hard stable cones have already been (provably) 
found. Therefore the algorithm is infrared safe at all orders. 

Two proposals exist for approximate implementations of the seedless jet algorithm 
P, [17]. They both rely on the event being represented in terms of calorimeter towers, 
which is far from ideal when considering parton or hadron-level events. Ref. [6J also pro- 
posed a procedure for an exact seedless jet algorithm, intended for fixed-order calculations, 
and implemented for example in the MCFM and NLOJet fixed order (NLO) codes [T8l[T9] n 
This method takes a time O (N2 N ^j to find jets among N particles. While perfectly ad- 
equate for fixed order calculations (N < 4), a recommendation to extend the use of such 
seedless cone implementations more generally would have little chance of being adopted 
experimentally: the time to find jets in a single (quiet!) event containing 100 particles 
would approach 10 17 years. 

Given the crucial importance of infrared safety in allowing one to compare theoretical 
predictions and experimental measurements, and the need for the same algorithm to be 
used in both, there is a strong motivation for finding a more efficient way of implementing 
the seedless cone algorithm. Section H] will show how this can be done, first in the context 
of a simple one-dimensional example (sec. 14. ip . then generalising it to two dimensions (y, 
<p, sec. 14.21) with an approach that can be made to run in polynomial (iV 2 In N) time. As 

1 A more appropriate name might be the doubly iterative cone algorithm, since as well as iterating the 
cones, the cone algorithm's definition has itself seen several iterations since its original introduction by 
UA1 in 1983 [2], and even since the Snowmass accord [T5], the first attempt to formulate a standard, 
infrared and collinear-safe cone-jet definition, over 15 years ago. 

2 Scction 3.4.2 of [6] is the source of some confusion regarding nomenclature, because after discussing 
both the midpoint and seedless algorithms, it proceeds to show some fixed-order results calculated with 
the seedless algorithm, but labelled as midpoint. Though both algorithms are IR safe up to the order that 
was shown, they would not have given identical results. 
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in recent work on speeding up the kt jet-algorithm [20], the key insights will be obtained 
by considering the geometrical aspects of the problem. Section 14.31 will discuss aspects of 
the split -merge procedure. 

In section |5] we will study a range of physics and practical properties of the seedless 
algorithm. Given that the split-merge stage is complex and so yet another potential source 
of infrared unsafety, we will use Monte Carlo techniques to provide independent evidence 
for the safety of the algorithm, supplementing a proof given in appendix [B] We will 
examine the speed of our coding of the algorithm and see that it is as fast as publicly 
available midpoint codes. We will also study the question of the relation between the low- 
order perturbative characteristics of the algorithm, and its all-order behaviour, notably 
as concerns the 'i? sep ' issue [2TI [T]. Finally we highlight physics contexts where we see 
similarities and differences between our seedless algorithm and the midpoint algorithm. 
For inclusive quantities, such as the inclusive jet spectrum, perturbative differences are of 
the order of a few percent, increasing to 10% at hadron level owing to reduced sensitivity to 
the underlying event in the seedless algorithm. For exclusive quantities we see differences 
of the order of 10 — 50%, for example for mass spectra in multi-jet events. 

2 Overview of the cone jet-finding algorithm 



Algorithm 1 A full specification of a modern cone algorithm, governed by four param- 
eters: the cone radius R, the overlap parameter /, the number of passes N pass and a 
minimum transverse momentum in the split-merge step, pt, m m- Throughout, particles are 
to be combined by summing their 4-momenta and distances are to be calculated using the 
longitudinally invariant Ay and A<p distance measures (where y is the rapidity). 

1: Put the set of current particles equal to the set of all particles in the event. 

2: repeat 

3: Find all stable cones of radius R (see Eq. ([!])) for the current set of particles, e.g. 

using algorithm [2], section H.2.21 
4: For each stable cone, create a protojet from the current particles contained in the 

cone, and add it to the list of protojets. 
5: Remove all particles that are in stable cones from the list of current particles. 
6: until No new stable cones are found, or one has gone around the loop N pass times. 
7: Run a Tevatron Run-II type split-merge procedure [6], algorithm [3] (section fl~3~i) . on 
the full list of protojets, with overlap parameter / and transverse momentum threshold 

Pt,min- 



Before entering into technical considerations, we outline the structure of a modern cone 
jet definition as algorithm [H largely based on the Tevatron Run-II specification (6]. It is 
governed by four parameters. The cone radius R and overlap parameter / are standard 
and appeared in previous cone algorithms. The N p3SB variable is new and embodies the 
suggestion in [1] that one should rerun the stable cone search to eliminate dark towers [2TJ , 
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particle 


Pt [GeV] 


y 





1 


400 








2 


110 


0.9R 





3 


90 


2.3R 





4 


1.1 


1.5R 






Table 1: Particles 1-3 represent a hard configuration. The jets from this hard configuration 
are modified in the midpoint cone algorithm when one adds the soft particle 4. 

i.e. particles that do not appear in any stable cones (and therefore never appear in jets) 
during a first pass of the algorithm, even though they can correspond to significant energy 
deposits. A sensible default is N pass = oo since, as formulated, the procedure will in any 
case stop once further passes find no further stable cones. The pt,min threshold for the 
split -merge step is also an addition relative to the Run II procedure, inspired by [T2], [TJ. 
It is discussed in section 14.31 together with the rest of the split -merge procedure and may 
be set to zero to recover the original Run II type behaviour, a sensible default. 

The main development of this paper is the specification of how to efficiently carry out 
step [1] of algorithm [TJ In section [3] we will show that the midpoint approximation for 
finding stable cones fails to find them all, leading to infrared unsafety problems. Section H] 
will provide a practical solution. Code corresponding to this algorithm is available publicly 
under the name of 'Seedless Infrared Safe Cone' (SISCone). 

3 IR unsafety in the midpoint algorithm 

Until now, the exact exhaustive identification of all stable cones was considered to be too 
computationally complex to be feasible for realistic particle multiplicities. Instead, the 
Tevatron experiments streamline the search for stable cones with the so-called 'midpoint 
algorithm' [9]. Given a seed, the latter calculates the total momentum of the particles 
contained within a cone centred on the seed, uses the direction of this momentum as a new 
seed and iterates until the resulting cone is stable. The initial set of seeds is that of all 
particles whose transverse momentum is above a seed threshold s (one may take s = to 
obtain a collinear-safe algorithm). Then, one adds a new set of seeds given by all midpoints 
between pairs of stable cones separated by less than 2R and repeats the iterations from 
these midpoint seeds. 

The problem with the midpoint cone algorithm can be seen from the configurations of 
table [U represented also in fig. [TJ Using particles 1 — 3, there exist three stable cones. 
In a p t -scheme recombination procedure (a p t weighted averaging of y and 0) they are at 
y ~ {0.194/2, 1.53.R, 2.3_R}H Note however that starting from particles 1, 2, 3 as seeds, one 
only iterates to the stable cones at y ~ 0.194i? and y = 2.3R. Using the midpoint between 

3 In a more standard -E-scheme (four-momentum) recombination procedure the exact numbers depend 
slightly on i?, but the conclusions are unchanged. 
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Figure 1: Configuration illustrating one of the IR unsafety problems of the midpoint jet 
algorithm (R = 1); (a) the stable cones (ellipses) found in the midpoint algorithm; (b) 
with the addition of an arbitrarily soft seed particle (red wavy line) an extra stable cone 
is found. 



these two stable cones, at y ~ 1.247 R, one iterates back to the stable cone at y ^ 0.194/?, 
therefore the stable cone at y — 1.53-R is never found. The result is that particles 1 and 2 
are in one jet, and particle 3 in another, figfjji. 

If additionally a soft particle (4) is present to act as a seed near y = 1.53-R, fig {lb, then 
the stable cone there is found from the iterative procedure. In this case we have three 
overlapping stable cones, with hard-particle content 1 + 2, 2 + 3 and 3. What happens 
next depends on the precise splitting and merging procedure that is adopted. Using that 
of [6] then for / < 0.55 the jets are merged into a single large jet 1 + 2 + 3, otherwise they 
are split into 1 and 2 + 3. Either way the jets are different from those obtained without 
the extra soft seed particle, meaning that the procedure is infrared unsafe. In contrast, a 
seedless approach would have found the three stable cones independently of the presence 
of the soft particle and so would have given identical sets of jets. 

The infrared divergence arises for configurations with 3 hard particles in a common 
neighbourhood plus one soft one (and a further hard electroweak boson or QCD parton 
to balance momentum). Quantities where it will be seen include the NLO contribution 
to the heavy-jet mass in W/Z+2-jet (or 3-jet) events, the NNLO contribution to the 
W/Z+2-jet cross section or the 3-jet cross section, or alternatively at NNNLO in the 
inclusive jet cross section. The problem might therefore initially seem remote, since the 
theoretical state of the art is far from calculations of any of these quantities. However 
one should recall that infrared safety at all orders is a prerequisite if the perturbation 
series is to make sense at all. If one takes the specific example of the Z+2-jet cross 
section (measured in [TU]) then the NNLO divergent piece would be regulated physically 
by confinement at the non-perturbative scale Aqcd, and would give a contribution of order 
o>ew® a s l n Pt/AQcx). Since a s (p t ) \np t / Aqcd ~ 1, this divergent NNLO contribution will be 
of the same order as the NLO piece dEw&l- Therefore the NLO calculation has little formal 
meaning for the midpoint algorithm, since contributions involving yet higher powers of a s 
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Observable 


1st miss cones at 


Last meaningful order 


Inclusive jet cross section 


NNLO 


NLO 


W/Z/H + 1 jet cross section 


NNLO 


NLO 


3 jet cross section 


NLO 


LO 


W/Z/H + 2 jet cross section 


NLO 


LO 


jet masses in 3 jets, W/Z/H + 2 jets 


LO 


none 



Table 2: Summary of the order (a* or otlotEw) at which stable cones are missed in various 
processes with a midpoint algorithm, and the corresponding last order that can be mean- 
ingfully calculated. Infrared unsafety first becomes visible one order beyond that at which 
one misses stable cones. 



will be parametrically as large as the NLO termQ The situation for a range of processes is 
summarised in table [2j 

4 An exact seedless cone jet definition 

One way in which one could imagine trying to 'patch' the seed-based iterative cone jet- 
algorithm to address the above problem would be to use midpoints between all pairs of 
particles as seeds, as well as midpoints between the initial set of stable conesj^l However 
it seems unlikely that this would resolve the fundamental problem of being sure that one 
will systematically find all solutions of eq. (0Q) for any ensemble of particles. 

Instead it is more appropriate to examine exhaustive, non-iterative approaches to the 
problem, i.e. an exact seedless cone jet algorithm, one that provably finds all stable cones, 
as advocated already some time ago in [16]. 

For very low multiplicities N, one approach is that suggested in section 3.3.3 of [B] and 
used in the MCFM [TS] and NLOJet [IH] next-to-leading order codes. One first identifies 
all possible subsets of the N particles in the event. For each subset S, one then determines 
the rapidity (ys) and azimuth (<ps) of the total momentum of the subset, ps = J2iesPi 
and then checks whether a cone centred on ys, <fis contains all particles in S but no other 
particles. If this is the case then S corresponds to a stable cone. This procedure guarantees 
that all solutions to eq. ([[]) will be found. 

In the above procedure there are ~ 2^ distinct subsets of particles and establishing 
whether a given subset corresponds to a stable cone takes time O (N) . Therefore the 
time to identify all stable cones is O (N2 N ). For the values of iV (< 4) relevant in fixed- 
order calculations, N2 N time is manageable, however as soon as one wishes to consider 

4 As concerns the measurement |10j . the discussion is complicated by the confusion surrounding the 
nomenclature of the seedless and midpoint algorithms — while it seems that the measurement was carried 
out with a true midpoint algorithm, the calculation probably used the 'midpoint' as defined in section 
3.4.2 of [5] (cf. footnote^, which is actually the seedless algorithm, i.e. the measurements and theoretical 
predictions are based on different algorithms. 

5 This option was actually mentioned in [6] but rejected at the time as impractical. 
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Figure 2: Representation of points on a line and the places where a sliding segment has a 
change in its set of enclosed points. 

parton-shower or hadron-level events, with dozens or hundreds of particles, N2 N time is 
prohibitive. A solution can only be considered realistic if it is polynomial in N, preferably 
with not too high a power of N. 

As mentioned in the introduction, approximate procedures for implementing seedless 
cone jet algorithms have been proposed in the past P, [17] . These rely on considering the 
momentum flow into discrete calorimeter towers rather than considering particles. As such 
they are not entirely suitable for examining the full range event levels, which go from fixed- 
order (few partons), via parton shower level (many partons) and hadron-level, to detector 
level which has both tracking and calorimetry information. 

4.1 One-dimensional example 

To understand how one might construct an efficient exact seedless cone jet algorithm, it is 
helpful to first examine a one- dimensional analogue of the problem. The aim is to identify 
all solutions to eq. (CQ), but just for (weighted) points on a line. The equivalent of a cone 
of radius R is a segment of length 2R. 

Rather than immediately looking for stable segments one instead looks for all distinct 
ways in which the segment can enclose a subset of the points on the line. Then for each 
separate enclosure one calculates its centroid C (weighted with the p t of the particles) and 
verifies whether the segment centred on C encloses the same set of points as the original 
enclosure. If it does then C is the centre of a stable segment. 

A simple way of finding all distinct segment-enclosures is illustrated in figj2j First one 
sorts the points into order on the line. One then places the segment far to the left and slides 
it so that it goes infinitesimally beyond the leftmost point. This is a first enclosure. Then 
one slides the segment again until its right edge encounters a new point or the left edge 
encounters a contained point. Each time either edge encounters a point, the point-content 
of the segment changes and one has a new distinct enclosure. Establishing the stability of 
each enclosure is trivial, since one knows how far the segment can move in each direction 
without changing its point content — so if the centroid is such that the segment remains 
within these limits, the enclosure corresponds to a stable segment. 

The computational complexity of the above procedure, NlnN, is dominated by the 
need to sort the points initially: there are O (N) distinct enclosures and, given the sorted 
list, finding the next point that will enter or leave an edge costs O (1) time, as does updating 
the weighted centroid (assuming rounding errors can be neglected), so that the time not 
associated with the sorting step is O (N). 
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Figure 3: (a) Some initial circular enclosure; (b) moving the circle in a random direction 
until some enclosed or external point touches the edge of the circle; (c) pivoting the circle 
around the edge point until a second point touches the edge; (d) all circles defined by pairs 
of edge points leading to the same circular enclosure. 



4.2 The two-dimensional case 
4.2.1 General approach 

The solution to the full problem can be seen as a 2-dimensional generalisation of the 
above procedure]^ The key idea is again that of trying to identify all distinct circular 
enclosures, which we also call distinct cones (by 'distinct' we mean having a different point 
content), and testing the stability of each one. In the one-dimensional example there was a 
single degree of freedom in specifying the position of the segment and all distinct segment 
enclosures could be obtained by considering all segments with an extremity defined by a 
point in the set. In 2 dimensions there are two degrees of freedom in specifying the position 
of a circle, and as we shall see, the solution to finding all distinct circular enclosures will 
be to examine all circles whose circumference lies on a pair of points from the set. 

To see in detail how one reaches this conclusion, it is useful to examine fig. [3j Box (a) 
shows a circle enclosing two points, the (red) crosses. Suppose, in analogy with fig. [2] that 
one wishes to slide the circle until its point content changes. One might choose a direction 
at random and after moving a certain distance, the circle's edge will hit some point in the 
plane, box (b), signalling that the point content is about to change. In the 1-dimensional 
case a single point, together with a binary orientation (taking it to be the left or right-hand 
point) were sufficient to characterise the segment enclosure. However in the 2-dimensional 
case one may orient the circle in an infinite number of ways. We can therefore pivot the 
circle around the boundary point. As one does this, at some point a second point will then 
touch the boundary of the circle, box (c). 

The importance of fig. [3] is that it illustrates that for each and every enclosure, one 
can always move the corresponding circle (without changing the enclosure contents) into 
a position where two points lie on its boundary^ Conversely, if one considers each circle 

6 We illustrate the planar problem rather than the cylindrical one since for R < ir/2 the latter is a 
trivial generalisation of the former. 

7 There are two minor exceptions to this: (a) for any point separated from all others by more than 2R 1 
the circle containing it can never have more than that one point on its edge — any such point forms a 
stable cone of its own; (b) there may be configurations where three or more points lie on the same circle 
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whose boundary is defined by a pair of points in the set, and considers all four permutations 
of the edge points being contained or not in the enclosure, then one will have identified 
all distinct circular enclosures. Note that one given enclosure can be defined by several 
distinct pairs of particles, which means that when considering the enclosures defined by all 
pairs of particles, we are likely to find each enclosure more than once, cf. fig. [3H. 

A specific implementation of the above approach to finding the stable cones is given 
as algorithm [2] below. It runs in expected time O (Nn In n) where N is the total number 
of particles and n is the typical number of particles in a circle of radius The time 
is dominated by a step that establishes a traversal order for the O (Nn) distinct circular 
enclosures, much as the one-dimensional (NhiN) example was dominated by the step 
that ordered the O (N) distinct segment enclosures^ Some aspects of algorithm [2] are 
rather technical and are explained in the subsubsection that follows. A reader interested 
principally in the physics of the algorithm may prefer to skip it on a first reading. 



4.2.2 Specific computational strategies 

A key input in evaluating the computational complexity of various algorithms is the knowl- 
edge of the number of distinct circular enclosures (or 'distinct cones') and the number of 
stable cones. These are both estimated in appendix IA.lt an d are respectively O (Nn) and 
(expected) 0(N). 

Before giving the 2-dimensional analogue of the 1-d algorithm of section H~TJ we examine 
a simple 'brute force' approach for finding all stable cones. One takes all ~ Nn pairs of 
points within 2R of each other and for each pair identifies the contents of the circle and 
establishes whether it corresponds to a stable cone, at a cost of O (N) each time, leading to 
an overall N 2 n total cost. This is to be compared to a standard midpoint cone algorithm, 
whose most expensive step will be the iteration of the expected O (Nn) midpoint seeds, 
for a total cost also of N 2 n, assuming the average number of iterations from any given seed 
to be O (1)@ 

One can reduce the computational complexity by using some of the ideas from the 1-d 
example, notably the introduction of an ordering for the boundary points of circles, and 
the use of the boundary points as sentinels for instability. Specifically, three elements will 
be required: 

i) one needs a way of labelling distinct cones that allows one to test whether two cones 
are the same at a cost of O (1); 

of radius R (i.e. are cocircular) — given a circle defined by a pair of them, the question of which of the 
others is in the circle becomes ambiguous and one should explicitly consider all possible combinations of 
inclusion/exclusion; a specific case of this is when there are collinear momenta (coincident points), which 
can however be dealt more simply by immediately merging them. 

8 Given a detector that extends to rapidities y < y max , n/N ~ 7T-R 2 /(47r?/ max ), which is considerably 
smaller than 1 — this motivates us to distinguish n from N . 

9 For comparison we note that the complexity of public midpoint algorithm implementations scales as 
N 2 n. 

10 In both cases one can reduce this to Nn 2 by tiling the plane into squares of edge-length R and 
restricting the search for the circle contents to tiles in the vicinity of the circle centre. 
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Algorithm 2 Procedure for establishing the list of all stable cones (protojets). For sim- 
plicity, parts related to the special case of multiple cocircular points (see footnote [7j) are 
not shown. They are a straightforward generalisation of steps |2j to [2j 

1: For any group of collinear particles, merge them into a single particle. 

2: for particle i — 1 . . . N do 

3: Find all particles j within a distance 2R of i. If there are no such particles, i forms 

a stable cone of its own. 
4: Otherwise for each j identify the two circles for which i and j lie on the circumference. 

For each circle, compute the angle of its centre C relative to i, ( = arctan ^y C c ■ 
5: Sort the circles found in steps [2] and [2] into increasing angle £. 

6: Take the first circle in this order, and call it the current circle. Calculate the total 
momentum and checkxor for the cones that it defines. Consider all 4 permutations 
of edge points being included or excluded. Call these the "current cones" . 

7: repeat 

8: for each of the 4 current cones do 

9: If this cone has not yet been found, add it to the list of distinct cones. 

10: If this cone has not yet been labelled as unstable, establish if the in/out status 

of the edge particles (with respect to the cone momentum axis) is the same as 
when defining the cone; if it is not, label the cone as unstable. 

11: end for 

12: Move to the next circle in order. It differs from the previous one either by a 
particle entering the circle, or one leaving the circle. Calculate the momentum for 
the new circle and corresponding new current cones by adding (or removing) the 
momentum of the particle that has entered (left); the checkxor can be updated by 
XORing with the label of that particle. 

13: until all circles considered. 

14: end for 

15: for each of the cones not labelled as unstable do 

16: Explicitly check its stability, and if it is stable, add it to the list of stable cones 

(protojets). 
17: end for 
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ii) one needs a way of ordering one's examination of cones so that one can construct the 
cones incrementally, so as not to pay the (at least, see below) O (y/n) construction 
price anew for each cone; 

iii) one needs a way limiting the number of cones for which we carry out a full stability 
test (which also costs at least \/n). 

To label cones efficiently, we assign a random g-bit integer tag to each particle. Then we 
define a tag for combinations of particles by taking the logical exclusive-or of all the tags of 
the individual particles (this is easily constructed incrementally and is sometimes referred 
to as a checkxor). Then two cones can be compared by examining their tags, rather 
than by comparing their full list of particles. With such a procedure, there is a risk of 
two non-identical cones ending up with identical tags ('colliding'), which strictly speaking 
will make our procedure only 'almost exact'. The probability p of a collision occurring is 
roughly the square of the number of enclosures divided by the number of distinct tags. 
Since we have O (Nn) enclosures, this gives p ~ N 2 n 2 /2 q . By taking q sufficiently large 
(in a test implementation we have used q = 96) and using a random number generator 
that guarantees that all bits are decorrelated [22], one can ensure a negligible collision 
probability!"] 

Given the ability to efficiently give a distinct label to distinct cones, one can address 
points ii) and iii) mentioned above by following algorithm [2J Point (ii) is dealt with by 
steps [2H21 [2] and [2j for each particle i, one establishes a traversal order for the circles 
having i on their edge — the traversal order is such that as one works through the circles, 
the circle content changes only by one particle at a time, making it easy to update the 
momentum and checkxor for the circleo One maintains a record of all distinct cones in 
the form of a hash (as a hash function one simply takes log 2 Nn bits of the tag), so that it 
only takes O (1) time to check whether a cone has been found previously. 

Rather than explicitly checking the stability of each distinct cone, the algorithm exam- 
ines whether the multiple edge points that define the cone are appropriately included/excluded 
in the circle around the cone's momentum axis, step [2j All but a tiny fraction of unstable 
cones fail this test, so that at the end of step [2] one has a list (of size O (N)) of candidate 
stable cones — at that point one can carry out a full stability test for each of them. This 
therefore deals with point (iii) mentioned above. 

The dominant part of algorithm [2] is the ordering of the circles, step [2], which takes 
nlnn time and must be repeated N times. Therefore the overall cost is Nnlnn. As 
well as computing time, a significant issue is the memory use, because one must maintain 
a list of all distinct cones, of which there are O (Nn) . One notes however that standard 



11 A more refined analysis shows that we need only worry about collisions between the tags of stable cones 
and other (stable or unstable) cones — since there are O (N) stable cones, the actual collision probability 
is more likely to be O (Nn 2 ) /2 q . In practice for N ~ 10 4 and n ~ 10 3 (a very highly populated event) 
and using q = 96, this gives p ~ 1CP 18 . In principle to guarantee an infinitesimal collision probability 
regardless of N, q should scale as IniV, however N will in any case be limited by memory use (which scales 
as Nn) so a fixed q is not unreasonable. 

12 Rounding errors can affect the accuracy of the momentum calculated this way; the impact of this can 
be minimised by occasionally recomputing the momentum of the circle from scratch. 
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implementations of the split-merge step of the cone algorithm also require O (Nn) storage, 
albeit with a smaller coefficient. 

It is worth highlighting also an alternative approach, which though slower, O (Nn 3 / 2 ), 
has lower memory consumption and also avoids the small risk inexactness from the check- 
xor. It is similar to the brute-force approach, but uses 2-dimensional computational ge- 
ometry tree structures, such as quad-trees [23J or k-d trees [21]. These involve successive 
sub-divisions of the plane (in quadrants, or pairs of rectangles), similarly to what is done 
in 1-dimensional binary trees. They make it possible to check the stability of a given circle 
in y/n time (the time is mostly taken by identifying tree cells near the edge of the circle, 
of which there are O (y/n)), giving an overall cost of Nn 3 / 2 . The memory use of this form 
of approach is O (Ny/n), simply the space needed to store the stable-cone contents!^! 

4.3 The split— merge part of the cone algorithm 

The split-merge part of our cone algorithm is basically that adopted for Run-II of 
the Tevatron [6j. It is shown in detail as algorithm [31 Since it does not depend on the 
procedure used to find stable cones, it may largely be kept as is. We do however include 
the following small modifications: 

1. The run II proposal used E t throughout the split-merge procedure. This is not 
invariant under longitudinal boosts. We replace it with p t , a scalar sum of the 
transverse momenta of the constituents of the protojet. This ensures that the results 
are both boost-invariant and infrared safe. We note that choosing instead p t (a 
seemingly natural choice, made for example in the code of [191 [13]) would have led 
to IR unsafety in purely hadronic events — the question of the variable to be used 
for the ordering is actually a rather delicate one, and we discuss it in more detail in 
appendix IB. 21 

2. We introduce a threshold Pt t mm below which protojets are discarded (step [3] of algo- 
rithm [3]). This parameter is motivated by the discussion in [6] concerning problems 
associated with an 'excess' of stable cones in seedless algorithms, notably in events 
with significant pileup. It provides an infrared and collinear safe way of removing the 
resulting large number of low p t stable cones. By setting it to zero one recovers a be- 
haviour identical to that of the Run-II algorithm (modulo the replacement E t — > pt, 
above), and we believe that in practice zero is actually a sensible default value. We 
note that a similar parameter is present in PxCone [121 E] ■ 

13 Though here we are mainly interested in exact approaches, one may also examine the question of 
the speed of the approximate seedless approach of Volobouev [17 . This approach represents the event 
on a grid and essentially calculates the stability of a cone at each point of the grid using a fast-Fourier 
transformation (FFT). In principle, for this procedure to be as good as the exact one, the grid should be 
fine enough to resolve each distinct cone, which implies that it should have O (Nn) points; therefore the 
FFT will require O (NnlnNn) time, which is similar in magnitude to the time that is needed by the exact 
algorithm. An open question remains that of whether a coarser grid might nevertheless be 'good enough' 
for many practical applications. 
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Algorithm 3 The disambiguated, scalar p t based formulation of a Tevatron Run-II type 
split -merge procedure [BJ, with overlap threshold parameter / and transverse momentum 
threshold pt,m\n- To ensure boost invariance and IR safety, for the ordering variable and the 
overlap measure, it uses of p t j e t = ^2i^ e t \Pt,%\i i- e - a scalar sum of the particle transverse 
momenta (as in a l pt recombination scheme). 
1: repeat 

2: Remove all protojets with p t < Pt 7 mi n - 

3: Identify the protojet (i) with the highest pt- 

4: Among the remaining protojets identify the one (J) with highest pt that shares 

particles (overlaps) with i. 
5: if there is such an overlapping jet then 

6: Determine the total ^shared = Yl.k&kj \Pt,k\ of the particles shared between i and j. 
7: if p f ,shared < fPtj then 

8: Each particle that is shared between the two protojets is assigned to the one to 

whose axis it is closest. The protojet momenta are then recalculated. 
9: else 

10: Merge the two protojets into a single new protojet (added to the list of protojets, 

while the two original ones are removed). 
11: end if 

12: If steps |3H3] produced a protojet that coincides with an existing one, maintain the 

new protojet as distinct from the existing copy(ies). 
13: else 

14: Add % to the list of final jets, and remove it from the list of protojets. 
15: end if 

16: until no protojets are left. 



3. After steps[3H3l the same protojet may appear more than once in the list of protojets. 
For example a protojet may come once from a single original stable cone, and a second 
time from the splitting of another original stable cone. The original statement of the 
split -merge procedure j6j did not address this issue, and there is a resulting ambiguity 
in how to proceed. One option (as is done for example in the seedless cone code of 
[T5] ) is to retain only a single copy of any such identical protojets. This however 
introduces a new source of infrared unsafety: an added soft particle might appear in 
one copy of the protojet and not the other and the two protojets would then no longer 
be identical and would not be reduced to a single protojet. This could (and does 
occasionally, as evidenced in section 15.11) alter the subsequent split-merge sequence. 
If one instead maintains multiple identical protojets as distinct entities (as is done in 
the codes of [13J, HE]), then the addition of a soft particle does not alter the number 
of hard protojet entries in the protojet list and the split-merge part of the algorithm 
remains infrared safe. We therefore choose this second option, and make it explicit 
as step [3] of algorithm [31 

The split-merge procedure is guaranteed to terminate because the number of overlapping 
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pairs of protojets is reduced each time an iteration of the loop finds an overlap. A proof of 
the infrared safety of this (and the other) parts of our formulation of the cone algorithm is 
given in appendix[Bl The computational complexity (O {N 2 )) of the split-merge procedure 
is generally smaller than that of the stable-cone search, and so we relegate its discussion 
to appendix IA.2I 

Finally, before closing this section, let us return briefly to the top-level of the cone 
formulation, algorithm [I] and the question of the loop over multiple passes. This loop 
contains just the stable-cone search, and one might wonder why the split-merge step has 
not also been included in the loop. First consider pt,min = 0: protojets found in different 
passes cannot overlap, and the split-merge procedure is such that if a particle is in a 
protojet then it will always end up in a jet. Therefore it is immaterial whether the split- 
merge step is kept inside or outside the loop. The advantage of keeping it outside the loop 
is that one may rerun the algorithm with multiple overlap values / simply by repeating 
the split-merge step, without repeating the search for stable cones. For p t) rain 7^ the 
positioning of the split-merge step with respect to the A^ pass loop would affect the outcome 
of the algorithm if all particles not found in first-pass jets were to be inserted into the 
second pass stable-cone search. Our specific formulation constitutes a design choice, which 
allows one to rerun with different values of / and p f min without repeating the stable-cone 
search. 

5 Tests and comparisons 
5.1 Measures of IR (un) safety 

In section [4] we presented a procedure for finding stable cones that is explicitly IR safe. In 
appendix [B] we provide a proof of the IR safety of the rest of the algorithm. The latter is 
rather technical and not short, and while we have every reason to believe it to be correct, 
we feel that there is value in supplementing it with complementary evidence for the IR 
safety of the algorithm. As a byproduct, we will obtain a measure of the IR unsafety of 
various commonly used formulations of the cone algorithm. 

To verify the IR safety of the seedless cone algorithm, we opt for a numerical Monte 
Carlo approach, in analogy with that used in [25] to test the more involved recursive 
infrared and collinear safety (a prerequisite for certain kinds of resummation). The test 
proceeds as follows. One generates a 'hard' event consisting of some number of randomly 
distributed momenta of the order of some hard scale Pt,H, and runs the jet algorithm on the 
hard event. One then generates some soft momenta at a scale p t) s "C Pt,H, adds them to the 
hard event (randomly permuting the order of the momenta) and reruns the jet algorithm. 
One verifies that the hard jets obtained with and without the soft event are identical. If 
they are not, the jet algorithm is IR unsafe. For a given hard event one repeats the test 
with many different add-on soft events so as to be reasonably sure of identifying most hard 
events that are IR unsafe. One then repeats the whole procedure for many hard events. 

The hard events are produced as follows: we choose a linearly distributed random 
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Algorithm 


Tvne 


TP unsafe 


Code 


JetClu 


Seeded, no midpoints 


2h+ls [9] 


m 


SearchCone 


Seeded, search cone [21J, midpoints 


2h+ls pQ 


M 


MidPoint 


Seeded, midpoints (2-way) 


3h+ls pp 


m 


MidPoint-3 


Seeded, midpoints (2-way, 3-way) 


3h+ls 


m 


PxCone 


Seeded, midpoints (n-way), non-standard SM 


3h+ls 


m 


Seedless [SM-p t ] 


Seedless, SM uses p t 


4h+ls^ 


[here] 


Seedless [SM-MIP] 


Seedless, SM merges identical protojets 


4h+ls^ 


[here] 


Seedless [SISCone] 


Seedless, SM of algorithm [3] 


no 


[here] 



"Failures on 4h+ls arise only for R > n/4; for smaller R, failures arise only for higher multiplicities 
^Failures for 4h+ls are extremely rare, but become more common for 5h+ls and beyond 



Table 3: Summary of the various cone jet algorithms and the code used for tests here; 
SM stands for "split-merge"; Nh+Ms indicates that infrared unsafety is revealed with 
configurations consisting of N hard particles and M soft ones, not counting an additional 
hard, potentially non-QCD, particle to conserve momentum. All codes have been used in 
the form of plugins to Fast Jet (v2.1) [20] . 

number of momenta (between 2 and 10) and for each one generate a random p t (linearly 
distributed, 2~ 2AL p t ,H < Pt < Pt,H, with p tj u = 1000 GeV), a random rapidity (linearly 
distributed in —1.5 < y < 1.5) and a random cf). For each hard event we also choose 
random parameters for the jet algorithm, so as to cover the jet-algorithm parameter space 
(0.3 <R< 1.57, 0.25 <f< 0.95, linearly distributed, the upper limit on R being motivated 
by the requirement that R < n/2; the p i min on protojets is set to and the number of 
passes is set to 1). For each add-on soft event we generate between 1 and 5 soft momenta, 
distributed as the hard ones, but with the soft scale p tt s = 10~ 100 GeV replacing pt,H- 

We note that the hard events generated as above do not conserve momentum — they 
are analogous to events with a missing energy component or with identified photons or 
leptons that are not given as inputs to the jet clustering. For the safety studies on the 
full SISCone algorithm, we therefore also generate a set of hard events which do have 
momentum conservation, analogous to purely hadronic events. 

To validate our approach to testing IR safety, we apply it to a range of cone jet algo- 
rithms, listed in table [3X including the many variants that are IR unsafe. In PxCone the 
cut on protojets is set to 1 GeV and in the SearchCone algorithm the search cone radius 
is set to R/2. 

The fraction of hard events failing the safety test is shown in fig. H]for each of the jet 
algorithms^ All jet algorithms that are known to be IR unsafe do indeed fail the tests. 
One should be aware that the absolute failure rates depend to some extent on the way we 

14 The results are based on 80 trial soft add-on events for each hard event and should differ by no more 
than a few percent (relative) from a full determination of the IR safety for each hard event (which would be 
obtained in the limit of an infinite number of trial soft add-on events for each hard event). For SISCone we 
only use 20 soft add-on events, so as to make it possible to probe a larger number of hard configurations. 



17 



' 1 ' ' 1 1 ' 1 1 ' 1 


JetClu 


50.1% 




SearchCone 


48.2% 




Mid Point 


16.4% 


Midpoint-3 


15 6% 


PxCone 


9.3% 


Seedless [SM-p t ] 1 .6% 








0.17% Seedless [SM-MIP] 


< 10" 9 Seedless (SISCone) 
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Fraction of hard events failing IR safety test 



Figure 4: Failure rates for the IR safety tests. The algorithms are as detailed in table [31 
Seeded algorithms have been used with a zero seed threshold. The events used do not 
conserve momentum (i.e. have a missing energy component), except for the seedless SM-p t 
case (where all events conserve momentum, to highlight the issue that arises in that case) 
and for SISCone (where we use a mix of momentum conserving and non-conserving events 
so as to fully test the algorithm). Further details are given in the text 

generated the hard events, and so are to be interpreted with caution. Having said that, 
our hard events have a complexity similar to the Born-level (lowest-order parton-level) 
of events that will be studied at LHC, for example in the various decay channels of tiH 
production, and so both the order of magnitudes of the failure rates and their relative sizes 
should be meaningful. 

Algorithms that fail on '2h+ls' events have larger failure rates than those that fail 
on '3h+ls' events, as would be expected — they are 'more' infrared unsafe. One notes 
the significant failure rates for the midpoint algorithms, ~ 16%, and the fact that adding 
3-way midpoints (i.e. between triplets of stable cones) has almost no effect on the failure 
rate, indicating that triangular configurations identified as IR unsafe in pQ are much less 
important than others such as that discussed in section [3J PxCone's smaller failure rate 
seems to be due not to its multi-way midpoints, but rather to its specific split-merge 
procedure which leads to fewer final jets (so that one is less sensitive to missing stable 
cones) . 

Seedless algorithms with problematic split-merge procedures lead to small failure rates 
(restricting one's attention to small values of R, these values are further reduced). One 
might be tempted to argue that such small rates of IR safety failure are unlikely to have 
a physical impact and can therefore be ignored. However there is always a risk of some 
specific study being unusually sensitive to these configurations, and in any case our aim 
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here is to provide an algorithm whose IR safety is exact, not just approximate. 

Finally, with a 'good' split-merge procedure, that given as algorithm^ none of the over 
5 x 10 9 hard events tested (a mix both with and without momentum conservation) failed 
the IR safety test. For completeness, we have carried out limited tests also for N pass = oo 
and with a pt, m in on protojets of 100 GeV, and have additionally performed tests with a 
larger range of rapidities (\y\ < 3), collinearly-split momenta, cocircular configurations, 
three scales instead of two scales and again found no failures. These tests together with 
the proof given in appendix [B] give us a good degree of confidence that the algorithm truly 
is infrared safe, hence justifying its name. 

5.2 Speed 

As can be gathered from the discussion in [6], reasonable speed is an essential requirement 
if a new variant of cone jet algorithm is to be adopted. To determine the speed of various 
cone jet algorithms, we use the same set of events taken for testing the Fast Jet formulation 
of the k t jet algorithm in [20J - these consist of a single Pythia [26] dijet event (with 
p t j ets ~ 50 GeV) to which we add varying numbers of simulated minimum bias events so 
as to vary the multiplicity N. Thus the event structure should mimic that of LHC events 
with pileup. 

Figure [5] shows the time needed to find jets in one event as a function of N. Among 
the seeded jet algorithms we consider only codes that include midpoint seeds. For the 
(CDF) midpoint code [15] . written in C++, there is an option of using only particles above 
a threshold s as seeds and we consider both the common (though collinear unsafe) choice 
s — 1 GeV and the (collinear safe but IR unsafe) s = GeV. The PxCone code [12] , 
written in Fortran 77, has no seed threshold. 

Our seedless code, SISCone, is comparable in speed to the fastest of the seeded codes, 
the CDF midpoint code with a seed threshold s — 1 GeV, and is considerably faster than 
the codes without a seed threshold (not to mention existing exact seedless codes which take 
~ 1 s to find jets among 20 particles and scale as N2 N ). Its run time also increases more 
slowly with N than that of the seeded codes, roughly in agreement with the expectation 
of SISCone going as Nnlnn (with a large coefficient) while the others go as N 2 n. The 
midpoint code with s = 1 GeV has a more complex iV-dependence presumably because 
we have run the timing on a single set of momenta, and the proportionality between the 
number of seeds and TV fluctuates and depends on the event structure. 

For comparison purposes we have also included the timings for the Fast Jet (v2) k t imple- 
mentation, which for these values of N uses a strategy that involves a combination of N In N 
and Nn dependencies. Timings for the Fast Jet implementation of the Aachen/Cambridge 
algorithm are similar to those for the k t algorithm. 

5.3 R se p' an inexistent problem 

Suppose we have two partons separated by AR and with transverse momenta pa and pti 
(pa > Pt2)- Both partons end up in the same jet if the cone containing both is stable, i.e. 
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Figure 5: Time to cluster N particles, as a function of N, for various algorithms, with 
R = 0.7 and / = 0.5, on a 3.4GHz Pentium® IV processor. For the CDF midpoint 
algorithm, s is the threshold transverse momentum above which particles are used as 
seeds. 



if 

— <l + z, z = — , 3 

R Pa 

where the result is exact for small R or with p f -scheme recombination. Equivalently one 
can write the probability for two partons to be clustered into a single jet as 

P 2 ^(AR,z) = e(l + z-^pj . (4) 

The limit on AR/R ranges from 1 for z — to 2 for z — 1. This z-dependent limit is the 
main low-order perturbative difference between the cone algorithm and inclusive versions 
of sequential recombination ones like the k t or Cambridge/Aachen algorithms, since the 
latter merge two partons into a single jet for AR/R < 1, independently of their energies. 

A statement regularly made about cone algorithms (see for example [211 EH !2Zj) is 
that parton showering and hadronisation reduce the stability of the cone containing the 
'original' two partons, leading to a modified 'practical' condition for two partons to end 
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Figure 6: Schematic representation of the phase space region in which two partons will 
end up in a single cone jet versus two jets, at the 2-parton level (PT) and, according to 
the Rse P statement, after showering and hadronisation (NP). 



up in a single jet, 
or equivalently, 



AR 



< min (R sep ,1+z 



(5) 



(6) 



with R scp ~ 



1.3 [281 [29] O This situation is often represented as in figure [6], which depicts 
the AR, z plane, and shows the regions in which two partons are merged into one jet or 
resolved as two jets. The boundary AR — 1+z corresponds to eq. ([3]), while the alternative 
boundary at AR = R sep is eq. ([5]). 

So large a difference between the low-order partonic expectation and hadron-level results 
would be quite a worrying feature for a jet algorithm — after all, the main purpose of a 
jet algorithm is to give as close a relation as possible between the first couple of orders of 
perturbation theory and hadron level f^l 

The evidence for the existence of eq. ([HD with R sep = 1.3 seems largely to be based [23, 
29] on merging two events (satisfying some cut on the jet pts), running the jet-algorithm 
on the merged event, and examining at what distance particles from the two events end 
up in the same jet. This approach indicated that particles were indeed less likely to end 
up in the same jet if they were more than 1.3R apart, however the result is an average 
over a range of z values making it hard to see whether eq. ([HD is truly representative of the 



sop was originally introduced [30] in the context of NLO calculations of hadron-collider 



15 The name R 

jet-spectra, but with a different meaning — there it was intended as a free parameter to model the lack 
of knowledge about the details of the definition of the cone jet algorithm used experimentally. This is 
rather different from the current use as a parameter intended to model our inability to directly calculate 
the impact of higher-order and non-perturbative dynamics of QCD in cone algorithms. 

16 The apparent lack of correspondence is considered sufficiently severe that in some publications {e.g. 
[IT] ) the NLO calculation is modified by hand to compensate for this. 
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Figure 7: The probability P 2 ^i(AR,z) for two k t - algorithm subjets to correspond to a 
single cone jet, as a function of Pti/pt2 and AR for the two k t subjets. Events have been 
generated with Herwig [31 J (hadron-level includes the underlying event) and the results 
are based on studying all k t jets with p t > 50 GeV and \y\ <1. Further details are to be 
found in the text. 



underlying physicsf^l 

To address the question in more depth we adopt the following strategy. Rather than 
combining different events, we use one event at a time, but with two different jet algorithms. 
On one hand we run SISCone with a fairly small value of R, R con e = 0.4. Simultaneously 
we run inclusive k t jet-clustering [2] on the event, using a relatively large R (R^ = 1.0), 
and identify any hard fc r jets. For each hard k t jet we undo its last clustering step so as to 
obtain two subjets, S\ and S 2 — these are taken to be the analogues of the two partons. 
We then examine whether there is a cone jet that contains more than half of the pt of 
each of Si and 5*2. If there is, the conclusion is that the two k t subjets have ended up 
( dominant ly) in a single cone jet. 

The procedure is repeated for many events, and one then examines the probability, 
P 2 ^i(AR, z), of the two k t subjets being identified with a single cone jet, as a function of 
the distance AR between the two subjets, Si and S 2 , and the ratio z of their p t 's. The 
results are shown in fig. [7] both at parton-shower level and at hadron level, as simulated 
with Herwig [31J. The middle contour corresponds to a probability of 1/2. At parton- 

17 A preliminary version of ,2?j showed more differential results; these, however, seem not to be in the 
definitive version. 
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shower level this contour coincides remarkably well with the boundary defined by eq. ([31 , 
up to AR/R = 1.7. It is definitely not compatible with eq. ([5]) with _R sop = 1.3. Beyond 
AR/R = 1.7 the contour bends a little and one might consider interpreting this as an 
#se P ^ 1-80 However, in that region the transition between P = 1 and P = is broad, 
and to within the width of the transition, there remains good agreement with eq. ([3]) — it 
seems more natural therefore to interpret the small deviation from eq. ([3]) as a Sudakov- 
shoulder type structure [32], which broadens and shifts the 0-function of eq. (TjJ, as would 
happen with almost any discontinuity in a leading-order QCD distribution. 

Once one includes hadronisation effects in the study, fig. [7b, one finds that the transition 
region broadens further, as is to be expected. Now the P = 1/2 contour shifts away slightly 
from the 1 + z result at small z as well. However, once again this shift is modest, and of 
similar size as the breadth of the transition region. 

To verify the robustness of the above results we have examined other related indicators. 
One of them is the probability, P<i^,i of finding two cone jets, each containing more than 
half of the transverse momentum of just one of the k t subjets. At two-parton level, one 
expects P\^2 + -^2^2 = 1- Deviation from this would indicate that our procedure for 
matching cone jets to k t jets is misbehaving. We find that the relation holds to within 
around 15% over most of the region, deviating by at most ~ 25% in a small corner of phase 
space AR/R ~ 1.5, z ~ 0.2. Another test is to examine the fraction F2 of the softer SVs 
transverse momentum that is found in the cone that overlaps dominantly with Si. At two- 
parton level this should be equal to P2^i, but this would not be the case after showering 
if there were underlying problems with our matching procedure. We find however that F 2 
does agree well with P2->i- These, together with yet further tests, lead to us to believe that 
conclusions drawn from fig. [7] are robust. Finally, while these results have been obtained 
within a Monte Carlo simulation, Herwig, a similar study could equally be well carried 
experimentally on real events. 

So, in contrast to statements that are often made about the cone jet algorithm, the 
perturbative picture of when two partons will recombine, given by eq. (jlj), seems to be a 
relatively good indicator of what happens even after perturbative radiation and hadronisa- 
tion. In particular the evidence that we have presented strongly disfavours the i? sep -based 
modification, eq. This is a welcome finding, and should help provide a firmer basis for 
cone-based phenomenology. 

5.4 Physics impact of seedless v. midpoint cone 

In this section, we discuss the impact on physical measurement of switching from a mid- 
point type algorithm to a seedless IR-safe one such as SISCone. We study two physical 
observables, the inclusive jet spectrum and the jet mass spectrum in 3-jet events. The 
spectra have been obtained by generating events with a Monte-Carlo either at fixed order 
in perturbation theory (NLOJet [19]) or with parton showering and hadronisation (Pythia 

18 Such a value has been mentioned to us independently by M. Wobisch in the context of unpublished 
studies of jet shapes for the SearchCone algorithm [21] . 
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[26]). and by performing the jet analysis on each event using three different algorithms 
(each with R = 0.7 and / = 0.5, and additionally in the case of SISCone, N pass = 1 and 

Pt,mm = 0): 

1. SISCone: the seedless, IR-safe definition described in algorithms [THS] 

2. midpoint (0): the midpoint algorithm using all particles as seeds; 

3. midpoint(l): the midpoint algorithm using as seeds all particles above a threshold 
of 1 GeV. 

We have used a version of the CDF implementation of the midpoint algorithm modified to 
have the split-merge step based on p t rather than p t (so that it corresponds to algorithm 14. 31 
with pt,mm = 0). The motivation for this is that we are mainly interested in the physics 
impact of having midpoint versus all stable cones, and the comparison is simplest if the 
subsequent split-merge procedure is identical in both cases0 

We shall first present the results obtained for the inclusive jet spectrum and then discuss 
the jet mass spectrum in 3-jet events. Most studies carried out in this section have used 
kinematics corresponding to the Tevatron Run II, i.e. a centre-of-mass energy y/s = 1.96 
TeV, and usually, for simplicity we have chosen not to impose any cuts in rapidity. 



5.4.1 Inclusive jet spectrum 

As discussed in section [31 the differences between the midpoint algorithm and SISCone are 
expected to start when we have 3 particles in a common neighbourhood plus one to balance 
momentum. For pure QCD processes this corresponds to 2 — > 4 diagrams, O {oq). This is 
NNLO for the inclusive spectrum. Though a NNLO calculation of the inclusive spectrum 
is beyond today's technology (for recent progress, see [33]), we can easily calculate the 
O (aj) difference between midpoint and SISCone, using just tree-level 2 — » 4 diagrams, 
since the difference between the algorithms is zero at orders a 2 and ot z s , i.e. we can neglect 
two-loop 2^2 diagrams and one-loop 2^3 diagrams. The significance of the difference 
can be understood by comparing to the leading order spectrum, which is identical for the 
two algorithms. 

Figure [8] shows the resulting spectra: the upper plot gives the leading order inclusive 
spectrum together with the difference between SISCone and midpoint (0) at O (ot A s ). The 
lower plot shows the relative difference. One sees that the use of the IR-safe seedless cone 
algorithm introduces modest corrections, of order 1-2%, in the inclusive jet spectrum. This 
order of magnitude is roughly what one would expect, since the differences only appear at 
relative order a 2 . As we will see below, larger differences will appear when one examines 
more exclusive quantities. 

19 We could also have compared SISCone with a midpoint algorithm using pt in the split-merge (a 
common default); the figures we show below would have stayed unchanged at the 1% level for the inclusive 
spectrum, while for the jet masses the effects range between a few percent at moderate masses and 10 — 20% 
in the high-mass tail. 
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Figure 8: (a) Inclusive jet spectrum: the upper curve gives the leading-order [O {a 2 s )) 
spectrum, while the lower (blue) curve gives the difference between the SISCone and mid- 
point (0) algorithm, obtained from the O (aif) tree-level amplitude; (b) the relative differ- 
ence. 
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Figure 9: Relative difference between the inclusive jet spectra for midpoint(l) and SIS- 
Cone, obtained from Pythia at parton level, hadron level without underlying event (UE) 
contributions, and hadron level with UE. Shown (a) for Tevatron collisions and (b) for 
LHC collisions. 



In addition, we have used Herwig and Pythia to investigate the differences between 
midpoint(l) and SISCone with parton showering. Both generators give similar results, and 
we show the results just of Pythia, fig. [9k. The difference at parton level is very similar 
to what was observed at fixed order. At hadron level without underlying event (UE) 
corrections, the difference remains at the level of 1 — 2% (though it changes sign); once one 
includes the underlying event contributions, the difference increases noticeably at lower 
p t — this is because the midpoint (1) algorithm receives somewhat larger UE corrections 
than SISCone. Since the underlying event is one of the things that is likely to change from 
Tevatron to LHC, in figure [9b we show similar curves for LHC kinematics. At parton level 
and at hadron level without the underlying event, the results are essentially the same as for 
the Tevatron. With the underlying event included, the impact of the missing stable cones 
in the midpoint algorithm reaches of the order of 10 to 15%, and thus starts to become 
quite a significant effect. With Herwig, we find that the impact is little smaller because its 
underlying event is smaller than Pythia's at the LHC. 



5.4.2 Jet masses in 3-jet events 

As well as the inclusive jet pr spectrum, we can also study more exclusive quantities. One 
example is the jet-mass spectrum in multi-jet events. Jet-masses are potentially of interest 
for QCD studies, particle mass measurements [34] and new physics searches, where they 
could be used to identify highly boosted W/Z/H bosons or top quarks produced in the 
decays of new heavy particles [35J. 

The simplest multi-jet events in which to study jet masses are 3-jet events. There, the 
masses of all the jets vanish at the 3-particle level. The first order at which the jet masses 
become non-zero is O (a*) and this is also the order at which differences appear between 
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Figure 10: Mass spectrum of the second hardest jet as obtained with the different cone 
algorithms on tree-level 4-particle events (generated with NLOJet): the plots shows the 
relative difference between the midpoint and SISCone results. In the upper plot we consider 
all three-jet events satisfying the transverse-momentum cuts, while in the lower plot (note 
scale) we consider only those in which second and third jet are separated by Ai? 2 3 < 2R. 
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Figure 11: Mass spectrum of the third hardest jet obtained from the different cone algo- 
rithms run on three-jet Pythia events. The top-left (top-right) plot shows the spectrum in 
linear (logarithmic) scale and the bottom plots show the relative difference between each 
midpoint algorithm and SISCone. See the text for the details of the event selection. 
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the midpoint and seedless cone algorithms. Therefore, as in section 15.4.11 we generate 

2- ^4 tree-level events, but now keep only those with exactly 3 jets with > 20 GeV in 
the final state. We further impose that the hardest jet should have a px of at least 120 
GeV and the second hardest jet a pr of at least 60 GeV. With these cuts we can compute 
the jet-mass spectrum for each of the three jets and for the three different algorithms. 

In the upper plot of Figure [10], we show the relative difference " (midpoint (0) - SIS- 
Cone)/SISCone" for the mass spectrum of the second hardest jet. In the lower plot we 
show the same quantity for events in which we have placed an additional requirement that 
the y — <p distance between the second and third jets be less than 2R (such distance cuts are 
often used when trying to reconstruct chains of particle decays). The midpoint algorithm's 
omission of certain stable cones leads to an overestimate of the mass spectrum by up to 
~ 10% without a distance cut (much smaller differences are observed for the first and third 
jet) and of over 40% with a distance cut. The problem is enhanced by the presence of the 
distance cut because many more of the selected events then have three particles in a com- 
mon neighbourhood, and this is precisely the situation in which the midpoint algorithm 
misses stable cones (cf. section [3]). 

We emphasise also that the NLO calculation of these mass spectra would be impossible 
with a midpoint algorithm, because the 10 — 40% tree-level differences would be converted 
into an infrared divergent NLO contribution. 

A general comment is that the problems seen here for the midpoint algorithm without 
a distance cut are of the same general order of magnitude as the 16% failure rate in the 
IR safety tests of section 15.11 suggesting that the absolute failure rates given there are a 
good indicator of the degree of seriousness of issues that can arise in generic studies with 
the infrared unsafe algorithms. 

In addition to this fixed-order parton-level analysis, we have studied the jet masses in 

3- jet events at hadron level (i.e. after parton showering and hadronisation) using events 
generated with Pythia. At hadron level many more seeds are present, due to the large 
particle multiplicity. One might therefore expect the midpoint algorithm to become a 
good approximation to the seedless one. 

For the mass of the second hardest jet, i.e. the quantity we studied at fixed order in 
figure \TU\ we find that the midpoint and seedless algorithms do give rather similar results 
at hadron level. In other words differences that we see in a leading order calculation are 
not propagated through to the full hadron level result. This is a serious practical issue 
for the midpoint algorithm, because a jet algorithm's principal role is to provide a good 
mapping between low-order parton level and hadron level. 

Nevertheless, despite the many seeds that are present at hadron level, we find that there 
are still some observables for which the midpoint algorithm's lack of stable cones does have 
a large impact even at hadron level. This is the case that the mass distribution of the 
third hardest jet, shown in figure [Til (obtained without a distance cut) on both linear and 
logarithmic scales so as to help visualise the various regions of the distribution. Moderate 
differences are present in the peak region, but in the tail of the distribution they become 
large, up to 50%. They are greater for midpoint(l) than for midpoint(O), because the seed 
threshold causes fewer stable cones to be found with the midpoint (1) algorithm. 
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These results have been checked using the Herwig Monte-Carlo. We have observed 
similar differences at parton-shower level, at the hadron level and at the hadron level 
including underlying event, both in the peak of the distribution and in the tail. We note 
that hadronisation corrections are substantial in the tail of the distribution, both for the 
midpoint and SISCone algorithms. 

The above results confirm what one might naturally have expected: while very inclusive 
quantities may not be overly sensitive to the deficiencies of one's jet algorithm, as one 
extends one's investigations to more exclusive quantities, those deficiencies begin to have 
a much larger impact. 

6 Conclusions 

Given the widespread use of cone jet algorithms at the Tevatron and their foreseen contin- 
ued use at LHC, it is crucial that they be defined in an infrared safe way. This is necessary 
in general so as to ensure that low-order parton-level considerations about cone jet-finding 
hold also for the fully showered, hadronised jets that are observed in practice. It is also a 
prerequisite if measurements are to be meaningfully compared to fixed order (LO, NLO, 
NNLO) predictions. 

The midpoint iterative cone algorithm currently in use is infrared unsafe, as can be seen 
by examining the sets of stable cones that are found for simple three-parton configurations. 
This may seem surprising given that the midpoint algorithm was specifically designed to 
avoid an earlier infrared safety problem — however the midpoint infrared problem appears 
at one order higher in the coupling, and this is presumably why it was not identified in the 
original analyses. The tests shown in section 15.11 suggest that the midpoint-cone infrared 
safety problems, while smaller than without the midpoint, are actually quite significant 
(-15%). 

We therefore advocate that where a cone jet algorithm is used, it be a seedless variant. 
For such a proposal to be realistic it is crucial that the seedless variant be practical. The 
approaches adopted in fixed order codes take O (N2 N ) time and are clearly not suitable in 
general. Here we have shown that it is possible to carry out exact seedless jet-finding in ex- 
pected O (Nn 3 / 2 ) time with O (Nn 1 / 2 ) storage, or almost exact hf^l in expected O (Nn In n) 
time with O (Nn) storage (we recall that N is the total number of particles, n the typical 
number of particles in a jet). The second of these approaches has been implemented in a 
C++ code named SISCone, available also as a plugin for the FastJet package. For N ~ 1000 
it is comparable in speed to the existing CDF midpoint code with 1 GeV seeds. While this 
is considerably slower than the NlnN and related FastJet strategies [20] for the k t and 
Cambridge/Aachen jet algorithms, it remains within the limits of usability and provides 
for the first time a cone algorithm that is demonstrably infrared and collinear safe at all 
orders, and suitable for use at parton level, hadron level and detector level. 

As well as being infrared safe, a jet algorithm must provide a faithful mapping between 
expectations based on low-order perturbative considerations, and observations at hadron 

20 with a failure probability that can be made arbitrarily small and that we choose to be < 10~ 18 . 
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level. There has been considerable discussion of worrisome possible violations of such a 
correspondence for cone algorithms, the "-R sep " issue. For SISCone we find however that 
the correspondence holds well. 

An obvious final question is that of the impact on physics results of switching from 
the midpoint to the seedless cone. For inclusive quantities, one expects the seedless cone 
jet algorithm to give results quite similar to those of the midpoint cone, because the IR 
unsafety of the midpoint algorithm only appears at relatively higher orders. This is borne 
out in our fixed order and parton-shower studies of the inclusive jet spectrum where we 
see differences between the midpoint and SISCone algorithms of about a couple of percent. 
At moderate p t at hadron level, the differences can increase to 5 — 10%, because SISCone 
has a lower sensitivity to the underlying event, a welcome 'fringe-benefit' of the seedless 
algorithm. 

For less inclusive quantities, for example the distribution of jet masses in multi-jet 
events, differences can be significant. We find that for 3-jet events, the absence of some 
stable cones (i.e. infrared unsafety) in the midpoint algorithm leads to differences compared 
to SISCone at the ~ 10% level at leading order (af) in a large part of the jet-mass spectrum. 
Greater effects still, up to 50%, are seen with specific cuts at fixed order, and in the tails of 
the jet-mass spectra for parton-shower events. Thus, even if the infrared safety issues of the 
midpoint algorithm appear to be at the limit of today's accuracy when examining inclusive 
quantities, for measurements of even moderate precision in multi-jet configurations (of 
increasing interest at Tevatron and omnipresent at LHC), the use of a properly defined 
cone algorithm such as SISCone is likely to be of prime importance. 
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A Further computational details 



A.l Cone multiplicities 

In evaluating the computational complexity of (computational) algorithms for various 
stages of the cone jet algorithm it is necessary to know the numbers of distinct cones 
and of stable cones. Such information also constitutes basic knowledge about cone jet 
definitions, which may for example be of relevance in understanding their sensitivity to 
pileup, i.e. multiple pp interactions in the same bunch crossing. 

Since large multiplicities will be due to pileup, let us consider a simple model for the 
event structure which mimics pileup, namely a set of momenta distributed randomly in y 
and and all with similar p t 's (or alternatively with random p t 's in some limited range). 

Given that the particles will be spread out over a region in y, <p that is considerably 
larger than the cone area, in addition to N, the total number of particles, it is useful to 
introduce also n, the number of points likely to be contained in a region of area ttR 2 . 

The first question to investigate is that of the number of distinct cones. The number 
of pairs of points that has to be investigated is O (Nn) . However some of these pairs of 
points will lead to identical cones. It is natural to ask whether, despite this, the number 
of distinct cones is still O (Nn). To answer this question, one may examine how far one 
can displace a cone in any given direction before its point content changes. The area swept 
when moving a cone a distance SR is 4R 5R, and the average number of points intersected 
is 4pRSR where p = 0(n/R 2 ) is the density of points (per unit area). Therefore the 
distance moved before the cone edge is likely to touch a point is 5R = (ApR)^ 1 = O (R/n). 
Correspondingly the area in which one can move the centre of cone without changing the 
cone's contents is tt(SR) 2 = O (R 2 /n 2 ). Given that the total area is O (R 2 N/n) we have 
that the number of distinct cones is O (Nn) , the same magnitude as the number of relevant 
point pairs. 

Let us now consider the number of stable cones. If we take a cone at random and sum 
its momenta then the resulting momentum axis will differ from the original cone axis by an 
amount typically of order Rj y/n (since the standard deviation of y and for set of points 
in the cone is O (R)). The probability of the difference being < R/n in both the y and 
directions (i.e. the probability that the new axis contains the same set of particles) is 
~ (R/n) 2 /(R/y/n) 2 ~ 1/n. Therefore the number of stable cones is O (N). This assumes 
a random distribution of particles. There may exist special classes of configurations for 
which the number of stable cones is greater than O (N) . Therefore timing results that are 
sensitive to the number of stable cones are to be understood as "expected" results rather 
than rigorous upper bounds. 

A. 2 Computational complexity of the split— merge step 

To study the computational complexity of the split-merge step, we work with the expec- 
tation that there are O (N) initial protojets (as discussed above) and that there will be 
roughly N/n <C N final jets (since there are O (n) particles per jet). It is reasonable to 
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assume that there will be roughly equal numbers of merging and splitting operations. Split- 
ting leaves the number of protojets unchanged, while merging reduces it by 1. Therefore 
there will be O (N) split-merge steps before we reach the final list of jets. 

There are three kinds of tasks in the split-merge procedure. Firstly one has to maintain 
a list of jets ordered in p t , both for finding the one with highest pt and for searching through 
the remaining jets (in order of decreasing p t ) to find an overlapping one. Maintaining the 
jets in order is easily accomplished with a balanced tree (for example a priority_queue 
or multiset in C++), at a cost of iVln N for the initial construction and In N per update, 
i.e. a total of NlnN, which is small compared to the remaining steps. 

In examining the complexity of finding the hardest overlapping jet one needs to know 
the cost of comparing two jets for overlap as well as the typical number of times this will 
have to be done. A naive comparison of two jets takes time n. Using a 2d tree structure 
such as a quadtree or k-d tree (as suggested also by Volobouev [H]), this can be reduced 
to yfn. The number of jets to be compared before an overlap is found will depend on the 
event structure — if one assumes that jet positions are decorrelated with their p t 's, then 
O (N/n) comparisons will have to be made each time around the loop. The total cost of 
this will therefore be N 2 / \fn (N 2 ) with (without) a 2d tree. 

Finally each merging/splitting procedure will take yfn (n) time with (without) a tree, 
so the total time spent merging and splitting will be O (Ny/n) (or O (Nn) without a tree). 

The dominant step is the search for overlapping jets, which will have a total cost of 
iV 2 / 'y/n (with a sizable coefficient), or N 2 without any 2d tree structures. Since in practice 
N 2 is smaller than the Nn In n needed to find the stable cones, here the introduction of a 
tree structure gives little overall advantage. 

A final comment concerns memory usage: when not using any tree structures, the list of 
protojets and their contents requires O (Nn) space, which is the same order of magnitude 
as the storage needed for identifying the set of stable cones in the first place. With a tree 
structure this can be reduced to O (Ny/n). 

B Proof of IR safety of the SISCone algorithm 

In this appendix, we shall explicitly prove that SISCone, algorithms [THSl is infrared safe. 
This means that if we run SISCone first with a set of hard particles, then with the same set 
of hard particles together with additional soft particles, then: (a) all jets found in the event 
without soft particles will be found also in the event with the soft particles; (b) any extra 
jets found in the event with soft particles will themselves be soft, i.e. they will not contain 
any of the hard particles. If either of these conditions fails in a finite region of phasespace 
for the hard particles, then the cancellation between (soft) real and virtual diagrams will 
be broken at some order of perturbation theory, leading to divergent jet cross sections. 

We will first discuss the proof using a simplifying assumption: two protojets with 
distinct hard particle content have distinct values for the split-merge ordering variable, 
pt- We shall then discuss subtleties associated with various ordering variables, and explain 
why p t is a valid choice. 
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B.l General aspects of the proof 



By soft particles, we understand particles whose momenta are negligible compared to the 
hard ones. Specifically, for any set of hard particles {pi, . . . ,p n } and any set of soft ones 
{pi, . . . ,pm}, we consider a limit in which all soft momenta are scaled to zero, so that they 
do not affect any momentum sums, 



In what follows, the limit of the momenta of the soft particles being taken to zero will be 
implicit. 

Let us now compare two different runs of the cone algorithm: in the first one, referred to 
as the "hard event" , we compute the jets starting with a list of hard particles {pi, . . . ,p N }, 
and, in the second one, referred to as the "hard+soft event" , we compute the jets with the 
same set of hard particles plus additional soft particles {pi, . . . ,Pm}- As mentioned above, 
the IR safety of the SISCone algorithm amounts to the statements (a) that for every jet 
in the hard event there is a corresponding jet in the hard+soft event with identical hard 
particle content (plus possible extra soft particles) and (b) that there are no hard jets in 
the hard+soft event that do not correspond to a jet in the hard event. To prove this, we 
shall proceed in two steps: first, we shall show that the determination of stable cones is IR 
safe, then that the split-merge procedure is also IR safe. 

The IR safety of the stable-cone determination is a direct consequence of the fact that: 

• each cone initially built from the hard particles only was determined by two particles 
in algorithm [3 This cone is thus still present when adding soft particles and, because 
of eq. ([7]), is still stable. Hence, all stable cones from the hard event are also present 
after inclusion of soft particles, the only difference being that they also contain extra 
soft particles which do not modify their momentum. 

• no new stable cone containing hard particles can appear. Indeed, if a new stable 
cone appeared, S new with content {p ai , ■ ■ • ,Pa n ,Pai, • • • ,Pa m }, then the fact that its 
momentum J2p ai + Y^Paj corresponds to a stable cone, implies, by eq. ([7]), that the 
cone with just the hard momenta p ai is also stable. However as shown in section l4~2l 
all stable cones in the hard event have already been identified, therefore this cone 
cannot be new. 

From these two points, one can deduce that after the determination of the stable cones we 
end up with two different kinds of stable cones: firstly, there are those that are the same as 
in the hard event but with possible additional soft particles; and secondly there are stable 
cones that contain only soft particles. So, the 'hard content' of the stable cones has not 
been changed upon addition of soft particles and algorithm [2] is IR safe. 

The main idea behind the proof of the IR safety of the split-merge process, algorithm^ 
is to show by induction that the hard content of the protojets evolves in the same way for 
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the hard and hard+soft event. Since the hard content is the same at the beginning of the 
process, it will remain so all along the split-merge process which is what we want to prove. 

There is however a slight complication here: when running algorithm [3] over one itera- 
tion of the loop in the hard event, we sometimes have to consider more than one iteration 
of the loop in the hard+soft event. As we shall shortly see, in that case, only the last of 
these iterations modifies the hard content of the jets and it does so in the same way as in 
the hard event step. 

So, let us now follow the steps of algorithm [3] in parallel for the hard and hard+soft 
event, and show that they are equivalent as concerns the hard particles. In the following 
analysis, item numbers coincide with the corresponding step numbers in algorithm [3j 

El If pt,mm is non-zero, all purely soft protojets will be removed from the hard+soft 
event and by eq. (0) the same set of hard protojets will be removed in the hard and 
hard+soft event. Thus the correspondence between the hard protojets in the two 
events will persist independently of Pt t mm- 

[3j In general, protojets with identical hard content will have nearly identical pt values, 
whereas protojets with different hard-particle content will have substantially different 
pt valuesj^j] Therefore the addition of soft particles will not destroy the p t ordering 
and the protojet with the largest p t in the hard event, % will have the same hard 
content as the one in the hard+soft event (let us call it i'). 

[3j The selection of the highest-p t protojet j (J' in the hard+soft case) that overlaps with 
i (i 1 ) can differ in the hard and hard+soft events, and we need to consider separately 
the cases where this does not, or does happen. The first case, CI, is that %' and j' 
overlap in their hard content — because of the common p t ordering, j' must then 
have the same hard content as j. The second case, C2, is that i' and j' only overlap 
through their soft particles, so j' cannot be the 'same' jet as j (since j by definition 
overlaps with i through hard particles). By following the remaining part of the loop, 
we shall show that in the first case all modifications of the hard content are the same 
in the hard and hard+soft events, while, for the second case, the iteration of the loop 
in the hard+soft event does not modify any hard content of the protojets. In this 
second case, we then proceed to the next iteration of the loop in the hard+soft event 
but stay at the same one for the hard event. 

CI: The two protojets i' and j' overlap in their hard content 

I3ll3l We need to compute the fraction of p t shared by the two protojets. Since the 
hard contents of i (j) and i' (j') are identical, the fraction of overlap, given 
by the hard content only, will be the same in the hard and hard+soft events. 
Hence, the decision to split or merge the protojets will be identical. 

21 As mentioned already, this point is more delicate than it might seem at first sight. We come back to 
it in the second part of this appendix. 
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[3j Since the centres of both protojets are the same in the hard and hard+soft 
events, the decision to attribute a hard particle to one protojet or the other will 
be the same in both events. Hence splitting will reorganise hard particles in the 
same way for the hard+soft event as for the hard one. 

El In both the hard and the hard+soft events, the merging of the two protojets 
will result in a single protojet with the same hard content. 

C2: The two protojets i' and f overlap through soft particles only 

l3|3l Since the fraction of p t shared by the protojets will be in the limit eq. ([7]), the 
two protojets will be split. 

EJ In the splitting, only shared particles, i.e. soft particles, will be reassigned to 
the first or second protojet. The hard content is therefore left untouched, as is 
the p t ordering of the protojets. 

El At the end of the splitting/merging of the overlapping protojets, we have to consider 
the two possible overlap cases separately: in the first case, the hard contents of the 
protojets are modified in the same way for the hard and hard+soft event. This case 
is thus IR safe. In the second case, the iteration of the loop in the hard+soft event 
does not correspond to any iteration of the loop in the hard event. However the hard 
content of the protojets in the hard+soft event is not modified and the p t ordering of 
the jets remains identical; at the next iteration of the hard+soft loop, the new j' may 
once again have just soft overlap with %' and the loop will thus continue iterating, 
splitting the soft parts of the jets, but leaving the hard content of the jets unchanged. 
This will continue until j' corresponds to the j of the hard event, i.e. we encounter 
case 1@ Therefore even though we may have gone around the loop more times in 
the hard+soft event, we do always reach a stage where the split -merge operation in 
the hard+soft event coincides with that in the hard event, and so this part of the 
procedure is infrared safe. 

[3131 Up to possible intermediate loops involving case 2 above, when the protojet % has no 
overlapping protojets in the hard event, the corresponding i' in the hard+soft event 
has no overlaps either. Final jets will thus be added one by one with the same hard 
content in the hard and hard+soft events. 

This completes the proof that the SISCone algorithm is IR safe, modulo subtleties related 
to the ordering variable, as discussed below. Regarding the 'merge identical protojets' 
(MIP) procedure: 

22 Note that the second case can only happen a finite number of times between two occurrences of the 
first case: as the pt ordering is not modified during the second case, each time around the loop the overlap 
will involve a j' with a lower p t than in the previous iteration, until one reaches the j' that corresponds 
to j. 
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[3j In algorithm [31 we do not automatically merge protojets appearing with the same 
content during the split-merge process. This is IR safe. If instead we allow for two 
identical protojets to be automatically merged, then when two protojets have the 
same hard content but differ as a result of their soft content, they are automatically 
merged in the hard event but not in the hard+soft event. This in turn leads to IR 
unsafety of the final jets. 

A final comment concerns collinear safety and cocircular points. When defining a 
candidate cone from a pair of points, if additional points lie on the edge of the cone, then 
there is an ambiguity as to whether they will be included in the cone. From the geometrical 
point of view, this special case of cocircular points (on a circle of radius R) can be treated 
by considering all permutations of the the cocircular points being included or excluded 
from the circle contents. SISCone contains code to deal with this general issue. The case 
of identically collinear particles, though a specific example of co circularity, also adds the 
problem that a circle cannot properly be defined from two identical points. For explicit 
collinear safety we thus simply merge any collinear particles into a single particle, step [2] 
of algorithm [2j Given the resulting collinear-safe set of protojets, the split-merge steps 
preserve collinear safety, since particles at identical y — <p coordinates are treated identically. 

B.2 Split— merge ordering variable 

Suppose we use some generic variable v (which may be p t , E t , m t , p t , etc.) to decide the 
order in which we select protojets for the split-merge process. A crucial assumption in the 
proof of IR safety is that two jets with different hard content will also have substantially 
different values for v, i.e. the ordering of the v's will not be changed by soft modifications. 
If this is not the case then the choice of the hard protojets that enter a given split-merge 
loop iteration can be modified by soft momenta, with a high likelihood that the final jets 
will also be modified. 

At first sight one might think that whatever variable is used, it will have different values 
for distinct hard protojets. However, momentum conservation and coincident masses of 
identical particles can introduce relations between the kinematic characteristics of distinct 
protojets. Some care is therefore needed so as to ensure that these relations do not lead 
to degeneracies in the ordering, with consequent ambiguities and infrared unsafety for the 
final jets. In particular: 

• Two protojets can have equal and opposite transverse momenta if between them they 
contain all particles in the event (and the event has no missing energy or 'ignored' 
particles such as isolated leptons). It is probably fair to assume that no two protojets 
will have identical longitudinal components, since in pp collisions the hard partonic 
reaction does not occur in the pp centre of mass frame. 

• Two protojets will have identical masses if they each stem exclusively from the same 
kind of massive particle. The two massive particles may be undecayed (e.g. fully 
reconstructed 6-hadrons) or decayed (top, W, Z, H, or some non-standard new 
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particle), or even one decayed and the other not (some hypothetical particle with 
a long lifetime) o In the second case we can assume that two identical decayed 
particles have different decay planes, because there is a vanishing phase space for 
them to have identical decay planes. 

Note that in a simple two-parton event almost any choice of variable will lead to a degen- 
eracy (no sensible invariant will distinguish the two particles), however this specific case 
is not problematic because for R < n/2 neither of the two partons can be in a protojet 
that overlaps with anything else. From the point of view of IR safety, it is only for 'fat' 
(non-collimated) hard protojets that we need worry about the problem of degeneracies 
in the split -merge ordering, because only then will there be overlaps whose resolution is 
ambiguous in the presence of degeneracies. 

Let us now consider what occurs with various possible choices for the split-merge 
variable. 

p t : This choice, adopted in certain codes [13j[T9], can be seen to have a problem for events 
with momentum conservation in the hadronic part, because if two non-overlapping 
protojets contain, between them, all the hard particles then they will have identical 
PtS. If they each overlap with a common third protojet, the resulting split-merge 
sequence will be ambiguous. Table H] provides an example of such an event. The 
simplest occurrences of this problem (4h + Is) apply only to R > tt/A (four particles 
must form at least 3 fat protojets). The problem arises also for smaller R values, but 
only at higher multiplicities. 

m t : A workaround for the event of table H] is to use the transverse mass, m t = \J p 2 + m 2 . 
In pure QCD, with all particles stable, this is a good variable, because even if two 
fat protojets have identical p t 's through momentum conservation, the fact that they 
are 'fat' implies that they will be massive (over and above intrinsic particle masses), 
and the phase space for them to have identical masses vanishes, thus killing any 
IR divergences. However, for events with two identical decaying particles, two fat 
protojets resulting from the particle decays can have identical ptS (by momentum 
conservation) and identical masses (because the decaying particles were identical). 
This could happen for example in the fully hadronic decay channel for tt events. 
Thus, this choice is not advisable in a general purpose algorithm. 

E t : The variable used in the original run II proposal was E t [5J. It has the drawback that 
it is not longitudinally boost invariant: at central rapidity it is equal to m t , while 
at high rapidities it tends to p t . Because the phase space for two protojets to have 
identical rapidities vanishes (recall that we do not fix the partonic centre-of-mass) , 
two protojets with identical p^s and masses will have different Ef's, because the 

23 Strictly speaking, for all scenarios of decayed heavy particles, the finite width T of the particle ensures 
that the two jets actually have slightly different masses, breaking any degeneracies. In practice however, 
F\v,z,t ~ 1 GeV and (for a light Higgs) Th <C Aqcd, whereas for the width to save us from the dangers of 
degeneracies we would need T ^> A-qcd- 
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event 1 



event 2 



n 



Px Py Pz 



n 



Px Py Pz 




1 

2 
3 
4 



86.01 66 

64 -66 

-77 -70 

-73 70 

-0.01 2 




1 

2 
3 
4 



85.99 66 

64 -66 

-77 -70 

-73 70 

0.01 2 



Table 4: Illustration of two events that conserve transverse momentum and differ only 
through a soft particle, but lead to different hard jets with a split -merge procedures that 
uses pt as the ordering variable and for measuring overlap. All the particles are to be taken 
massless. For R = 0.9 and / = 0.7 each event has stable cones consisting of {01}, {23} 
and {12}, as well as all single particles. The slight difference in momenta between the two 
events, to balance the soft particle, causes the {01} ({23}) protojet to have the largest p t 
in the first (second) event, it splits with {12} (merges with {12}), leading after further 
split-merge steps to two hard jets, {01} and {23} (one hard 'monster' jet, {0123}). 



degree of 'interpolation' between between p t and m t will be different. This resolves 
the degeneracy and should cure the resulting IR safety issue, albeit at the expense 
of introducing boost-dependence. 

pt- The scalar sum of transverse momenta of the protojet constituents, pt, has the prop- 
erty that it is equal to m t if all particles in the protojet have identical rapidities, 
while it is equal to p t {i.e. the vector sum) if all particles have identical azimuths. 
For a decayed massive particle, it essentially interpolates between p t and m t accord- 
ing to the orientation of the decay plane. The phase space for all particles to have 
identical azimuths vanishes, as does the phase space for the decay products of two 
heavy particles to have identically oriented decay planes. Therefore this choice re- 
solves any degeneracies, as is needed for infrared safety. Another advantage of p t is 
that adding a particle to a protojet always increases its p t (this is not the case for p t 
or E t ), ensuring that the degree of overlap between a pair of jets is always bounded 
by 1. Since it is also boost invariant, it is the choice that we recommend and that 
we adopt as our default o 

Note that the above considerations hold for any split-merge procedure that relies on order- 
ing the jets according to a single-jet variable. One might also consider ordering according 
to variables determined from pairs of protojets: e.g. first split-merge the pair of protojets 
with the largest (or alternatively smallest) overlap, recalculate all overlaps, and then repeat 
until there are no further overlaps. However this specific example would also be dangerous, 

24 One might worry about the naturalness of a variable that depends on the decay plane of heavy particles 
— however, any unnaturalness is present anyway in the split-merge procedure since if two particles decay 
purely in the transverse plane then there is a likelihood of having overlapping protojets, whereas if they 
decay in longitudinally oriented decay planes they will not overlap. 
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since the particles that are common to protojets a and b (say) could also be the particles 
that are common between a and c, once again leading to an ambiguous split-merge se- 
quence. One protojet-pair ordering variable that might be free of this problem is the y — <fi 
distance between the protojets, however we have not investigated it in detail. 

A final comment concerns the impact of the split-merge procedure on non-global [36] 
resummations for jets [37], in which one is interested in determining which of a set of 
ordered soft particles are in a given hard jet. A soft and collinear splitting inside the jet 
can modify the p t (or E t or m t ) of the jet by an amount of the same order of magnitude 
as a soft, large-angle emission near the edge of the jet. In events with two back-to-back 
narrow jets, for which there is a near degeneracy between the ptS of the two hard jets, 
this can affect which of the two hard protojets split-merges first with an overlapping soft 
protojet, leading to ambiguities in the assignment of the soft particles to the two hard jets. 
This interaction between collinear and soft modes is somewhat reminiscent of that in [38] , 
though the origin and structure are kinematical in our case. Considering only branchings 
with transverse momenta above ep tj hard, for R> tt/4 this is likely to be relevant in events 
with two equally soft particles (a 2 In e) and n soft-collinear splittings (a™ ln 2n e) giving an 
overall contribution ai™ +2 ln 2n+1 e. This competes with the normal soft-ordered non-global 
logarithms, starting from order a 3 In 3 e. For R < 7r/4, the problem will only arise with a 
greater number of equally soft large-angle particles, and so will be further suppressed by 
powers of a s . 
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